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Abstract 

HlNl influenza causes substantial seasonal illness and was the subtype of the 2009 in- 
fluenza pandemic. Precise measures of antigenic distance between the vaccine and circulating 
virus strains help researchers design influenza vaccines with high vaccine effectiveness. We here 
introduce a sequence-based method to predict vaccine effectiveness in humans. Historical epi- 
demiological data show that this sequence-based method is as predictive of vaccine effectiveness 
as hemagglutination inhibition (HI) assay data from ferret animal model studies. Interestingly, 
the expected vaccine effectiveness is greater against HlNl than H3N2, suggesting a stronger 
immune response against HlNl than H3N2. The evolution rate of hemagglutinin in HlNl 
is also shown to be greater than that in H3N2, presumably due to greater immune selection 
pressure. 
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1 Introduction 

The annual trivalent vaccine for influenza contains one H3N2 strain, one HlNl strain, and one 
influenza B strain. This vaccine is currently the primary tool to prevent influenza infection and to 
control influenza epidemics. Due to the fast evolution of the influenza virus, the components of the 
influenza vaccine are changed for many flu seasons. Even though the vaccine is usuahy redesigned 
to match closely the newly evolved influenza virus strains, there occasionally has been a suboptimal 
match between vaccine and virus. Partly for this reason, vaccine effectiveness has varied in different 
years. The desire to have a vaccine with high effectiveness makes the prediction of the circulating 
influenza strain for the next influenza season a key step in vaccine design. A goal of the WHO is 
to recommend vaccine strains for the next flu season that will have the smallest antigenic distances 
to the dominant circulating strains in the next flu season, which often means using the dominant 
circulating strains in the current flu season as a reference. 

A variety of distance measures have been developed to evaluate the degree of match between the 
vaccine strain and the dominant circulating strain. The hemagglutinin protein (HA) of influenza 
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is primarily focused upon for this distance calculation since hemagglutinin is the dominant antigen 
for pro tective human antibod ies and exhibits the highest evolutionary rate among all the influenza 
genes ( Rambaut et al. 2008h . A widely used definition of antigenic distance is calculated from 
hemagglutination inhibition data from ferret animal model studies. To compare a pair of strains, 
a 2-by-2 HI titer matrix is built, and the antigenic distance is extracted from this matrix. This 
distance can be further refined by a dimensional projection technique termed antigenic cartography 
( Smith et al ]|i004). The mathematical basis of antigenic cartography is the dimension reduction of 
the shape space in which each point represents an influenza virus strain and the distance between 
a pair of points represents the antigenic distance between the corresponding strains. Note that 
antigenic cartography does not yield the distance data itself, but assesses the distance between the 
given vaccine strain and dominant circulating strain by globally considering the effect of all the 
strains and the ant igenic distances among them. In the original literature of antigenic cartography 
(jSmith et a/.l 120041 ) . hemagglutination inhibition data were the input of the antigenic cartography 
algorithm that obtains the final results of distances. Antigenic distances can also be defined by 
the amino acid sequences of the strains using computer-aided methods, in which the fraction of 
substituted amino acid in the dominant hemagglutinin e pitope bound by ant i body is defined by 
Peoitope as a sequeuc e-based antigenic distance measure ( Deem and Pan 20091 : Gupta et al. 200d 
Pan and DeemI l2009l ) . The amino acid sequences are downloaded from databases and processed 
to obtain these distance measures. The ^epitope sequence-based method has been shown to be an 



effective antigenic distance measure between two strains of H3N2 ([Deem and "Le3l2003l: l iGupta et al 



20061: iPan and Dee^llooi). To be clear, antigenic distance is a quantity that should define difference 
of viral strains, as determined by the human immune system. Ferret HI data are not the only or 
even the best measure of antigenic distances. 

The vaccine effectiveness, which varies from year to year, correlates with the antigenic distance 
between the vaccine strain and the dominant circulating strain. Thus the vaccine effectiveness 
can be predicted by calculating the antigenic distance. Such a priori estimation of the vaccine 
effectiveness guides health authorities to determine the appropriate strain for the vaccine component 
for the coming flu season. For H3N2 influenza, the Pepitope method offers a prediction of vaccine 
effectiveness that has a higher correlati on coefficient with vaccine effectiveness in humans than 
do distances derived by other methods ( Gupta et al. 20061 : Pan and Deem 2009f ). In this paper, 
we develop the Pepitopo method for HlNl influenza. In Materials and Methods we describe the 
epidemiological data used to calculate vaccine effectiveness and the animal model or sequence data 
used to calculate antigenic distance. In Results we show the correlation of antigenic distance with 
vaccine effectiveness. We discuss the results in the Discussion. 



2 Materials and Methods 

2.1 Identities of Vaccine Strains and Dominant Circulating Strains 

The vaccine strain selection by WHO in each year follows a standard procedure. The vaccine 
strains are reviewed every year and are usually changed every two to three years. We used the 
HlNl vaccine strains and HlNl dominant circulating strains in the epidemiological literature that 
provided vaccine effectiveness data used in this study. 



2 



2.2 Estimation of Vaccine Effectiveness 



The HlNl vaccine effectiveness is gathered from epidemiological literature regarding the influenza- 
like illness rate of unvaccinated (u) and vaccinated people (v). Vaccine effectiveness can be described 
by the following definition: 

u — V 

vaccme enectiveness — . (f) 

u 

To calculate vaccine effectiveness and its standard error, we let Nu and denote the number 
of subjects in the unvaccinated and vaccinated group, n„ and denote the number of illness in 
the unvaccinated and vaccinated group, respectively. The values and the standard errors of u, v, 
and vaccine effectiveness are 

u = n„/7V„ (2) 
V - n,/N, (3) 




If the vaccine effectiveness is averaged from N studies, Cyg = (X^i "'vEi) /-^^ where crvEi is the 
standard error of the z-th study. 

Compared to H3N2, subtype HlNl viruses were dominant in fewer years. Based on the pro- 
portions of samples of H3N2, HlNl, and influenza B collected in each year during 1977-2009, 
widespread HlNl circulation was observed in approximately 10 seasons. Epidemiological studies 
on vaccine effectiveness were absent for some years when HlNl circulated. Additionally, we used 
the criteria listed below to filter all available literature. 

To ensure that the vaccine effectiveness we collected from the literature is for HlNl, the seasons 
and the geographic regions of the epidemiological studies in the literature were compared with the 
influenza activity information in WHO Weekly Epidemiological Records to confirm that those re- 
gions were dominated by HlNl in those seasons. Subjects were restricted to 18-64 year old healthy 
adult humans to avoid effects of an underdeveloped immune system in children or of immunosenes- 
cence in senior people. If more than one measure of vaccine effectiveness was collected for the same 
season, they were averaged to minimize the statistical noise. 

In order to minimize the effect on vaccine effectiveness from co-circulating subtypes such as 
H3N2, only the epidemiological data collected in the regions and in the flu seasons in which the 
HlNl subtype was dominant were applied to calculate the vaccine effectiveness in this study. The 
seasons in which the HlNl subtype was dominant were reported by the literature on HlNl vaccine 
effectiveness. The studies cited in Table [2] for the calculation of vaccine effectiveness gave the 
subtype of the predominant epidemic virus as well as of the virus sampled from the subjects with 
influenza- like illness (ILI). In addition, the dominance of HlNl subtype is also available in the CDC 
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Morbidity and Mortality Weekly Reports and the WHO Weekly Epidemiological Record. For the 
data in Tabled the dominance of HlNl subtype was shown in these references. 

The vaccine effectiveness collected from various flu seasons and regions were measured with stan- 
dard errors. Biases in the vaccine effectiveness are due to the complexity of the vaccine effectiveness 
measurement, including the character of the human population studied, such as age, immune his- 
tory, and health condition; the influence of co-circulating H3N2 influenza strains; the character of 
the vaccine distributed, such as live attenuated virus vaccine, inactivated split-virus vaccine pro- 
duced by virion disassembly, or subunit vaccine only containing hemagglutinin and neuraminidase; 
the method of epidemiological measurement of influenza infection, such as virus detection, con- 
firmed symptomatic influenza, or influenza-like illness (ILI); the design of the experiment, such 
as natural infection or experimental challenge study; and the progression of the epidemic in the 
population under study. These biases are thus inevitable with current technology. Here, we applied 
the following methods to minimize biases in the vaccine effectiveness data. Subjects in the studies 
were confined to 18-64 years old healthy adult humans to preclude the interference of the feeble 
immune system in children or in senior people, because variation in the capability of the immune 
system is a determinant of the vaccine effectiveness given the same pair of vaccine strain and dom- 
inant circulating strain. Only epidemiological studies in the season and the region in which HlNl 
subtype was dominant were used to obtain the vaccine effectiveness data. The vaccine involved in 
the referred studies is an inactivated vaccine. Other types such as cold-adapted nasal spray vaccine 
were excluded. The epidemiological measurement of infection in all the referred studies used ILI 
as the criterion. Not all studies designed the experiment as a challenge study. We assume that 
the epidemic propagates in the population in a similar way in each season. These criteria are used 
to filter the available references and to obtain vaccine effectiveness data with minimum bias. The 
standard errors of the data are presented here. These criteria reduced the number of practical 
references for each season. Our metaanalysis considered 50 peer-reviewed papers, all we could find 
in the literature. We list the ones that satisfy our selection criteria for each of the years, typically 
1-3 per year. 



2.3 Antigenic Distance Measured By Sequence Data 

Figure [T] shows the HAl domain with five epitopes of the HI subtype hemagglutinin. As the 
improvement of a previous definition of HI epitopes (jCaton et aZ.lll982r) . these five HI epitopes 
are recognized by host antibodies and are identified b y mapping the well-defined epitopes in H3 
hemagglutinin ( Macken et al. 2001 : Wilev et all 198l[l to HI hema, gglutinin and using sequence 
entropy to find additional sites under selection foee m and Pan l2009l) . 

The antigenic distance between the vaccine strain and the dominant circulating strain is the 
input for the vaccine effectiveness prediction. The fraction of mutated amino acids in the epitope 
region of H A, or t he p-value, is an antigenic distance measure to quantify the similarity between 
two strains (jGupta et aLi2006) . One p- value is calculated for each HI epitope 



p- value = 



number of mutations in the epitope 
number of amino acids in the epitope 



(8) 



The Popitopo is defined as the maximum of five p-values for the five epitopes, and the dominant 
epitope is defined as the corresponding epitope. This definition, i.e. ass umption, has lead for H3N2 
to vaccine effectiveness predictions that correlate with those observed (jGupta et al\\200w . 
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Table 1: HI table with two strains and four HI titers. 





Ferret antisera 


Ferret antisera 




against Strain 1 


against Strain 2 


Strain 1 


Hn 


H12 


Strain 2 


H21 


H22 



Another sequence-based antigenic distance measure uses the fraction of mutated amino acid in 
all the five epitopes 

number of mutations in all the five epitopes 
a epitope number of amino acids in all the five epitopes 

As an alternative to Pcpitopc and Paii-opitopc, Psoqucnco is also used with the definition 

number of mutations in the HAI domain of hemagglutinin 
sequence ^q^qI number of amino acids in the HAI domain of hemagglutinin 

2.4 Antigenic Distance Measured by Hemagglutination Inhibition 

The animal model method to determine the distance between the vaccine strain and the dominant 
circulating strain employs the HI assay to give the HI table. See Table [T] Here Hij, i,j = 1,2 
are four HI titers measuring the capability of antibody j to inhibit hemagglutinin i. Note that in 
reality, health authorities including WHO and CDC provide HI tables with at least eight antisera to 
evaluate the antigenic distance between candidate vaccine strains and dominant circulating strain. 
These HI tables are mathematically equivalent to several 2 x 2 HI tables each of which defines the 
antigenic distance between one pair of strains in the original HI table. For each pair of strains, we 
picked up four entries determined by the identities of these two strains and the two corresponding 
antisera from the original HI table. The 2 x 2 HI tables in this manuscript are used to elaborate the 
formulae for di and d2. In this context Strain 1 is the vaccine strain and Strain 2 is the dominant 
circul ating strain. Two distance measures h ave been derived from these four HI titers in the HI 
table (|Lee and Chenll20Q4 [S^ith et al\\l99^ : 



d, = lo&ffi^) (11) 



Note that antigenic cartography is carried out on the asymmetrical distance, di (jSmith et al\\2004i} . 
When the vaccine strain and the dominant circulating strain in one season were not identical, we 
searched the literature for the HI tables with these two strains. The di and d2 values were averaged 
if multiple HI tables were found for one season. 
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Figure 1: HAl domain of the HI hemagglutinin in the ribbon format (PDB code: 1RU7). Epitope 
A (blue), B (red), C (cyan), D (yellow), an d E (red) are space fi lling. These five HI epitopes are 
the analogs of the well-defined H3 epitopes (|Deem and Pan l2009l) . 
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Table 2: Summary of results. Nine pairs of vaccine strains and dominant circulating strains in seven flu seasons in the Northern 
hemisphere were collected from literature. The quantities n„, Nu, n^, Ny, Pcpitopo, Paii-cpitopc, Psoqucncc, di, and c?2 are defined 
in Materials and Methods. Only those seasons when HlNl virus was dominant in at least one country or region where vaccine 
effectiveness data were available were considered. Two different vaccines have occasionally been adopted in different geographic 
regions for the same season, in which case two sets of data were added in this table. An asterisk signifies that co-circulating 
H3N2 was also found in the same country or region in that season; however, the interference to the final result from H3N2 is 
expected to be small, and so the sets of data with a single asterisk were preserved. 



Soaaon 


Vaccine 




Circulating strain^ 


effectiveness 
(%) 










epitope 




Pall-epitopc 




dl 




^2 


1982-83 


A/Braz 


il/11/78 


A/EngI 


pind/333/80 


37.0 ± 12.0^ 


48 


118 


31 


121-^ 


A 


0.083 


0.0311 


0.0184 


old 




1.4liU 


1983-84 


A/Braz 


il/ll/7S 


A/Victc 


ria/7/S3 


38-1 ± 10-3-^"^ 


30 


60 
298 


21 
46 


67l 
300^ 


c 


0.121 


0.0497 


0.0337 


1.13" 


-13 


13.66^^'-^'^ 


1986-87 (a) 


A/Ta™ 


an/1/86 


A/Taiw 


an/1/86 


64.8 ± 14.3^'^ 


11 


217 


13 


723" 
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1986-87 (b) 


A/Chil 


c/1/83 


A /Taiw 


an/1/86 


18.5 ± 12.1^ 


92 


878 


75 


878^ 


B 


0.318 


0.0807 


0.0399 


412,14 


-18 


24^814,16-18 


1988-89 


A/Ta™ 


an/1/86 


A/Taiw 


an/1/86 


43.1 ± 10.0^'^ 


119 


1125 


89 


1126^ 


















1 


1995-96 (a) 


A/TcxE 


b/36/01 


A/Tcxa 


b/36/91 


60.0 ± 27.8^ 


6 


12 


2 


lo" 


















1 


1995-96 (b)* 


A/Sing 


aporc/6/Se 


A/Tcxa 


3/36/91 


32.2 ± 5.8'^ 


99 
176 


652 
652 


57 
149 


684"^ 
684'^ 


A 


0.125 


0.0559 


0.0307 


0.86" 


19,20 


2,4314,20 


2006-07 


A/Ncw 


Calcdonia/20/99 


A/Ncw 


Calcdonia/20/99 


40.5 ± 2.5* 


1085 


230729 


1221 


436600* 


















1 


2007-08* 


A/Solo 


mon Islands/3/2006 


A/Soloi 


non Is!ands/3/2006 


62.8 ± 12.6^ 


94 


262 


8 


603 
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IMultiple strains are circulating in each season, while each strain has a specific proportion in the virus population in a certain 
region and season. The strain with the greatest proportion is defined as the dominant circulating strain, which is listed in 
this table. The dominant circulating strains in this table were chosen based on the literature on vaccine effectiveness, which 



also gave the region where the effectiveness data were collected. 

Literature used in the metaana lysis: 1. (ICou ch et al. ,19&w ; 2. ( Keitel et al. 19881) : 3 . dCouch e^ aZ.I 19961 ): 4. ( Keitel et al. 



(jRimmelzwaan et al\\200l\) . 



2008 ): 10. toaniels oi.l ll98,'5l): 11 (IChakravertv fit a Z.' '19861 ): 12. (ISmith fiii oiJ | l999^: 13. (lWHQ,198l: 14. et d. 

2OOII): 15. ( WHO 19 86): 1 6. (jKendal et fl/.lll990[ ): 17. (,Donatelh et oi.lll993^ : 



1997|) : 5. dEdwards et al\\l994 : 6. (iTreanor et a/.Hl999t): 7. (ICrotto aLlll998l): 8. (IWang et 2009[): 9. dBelongia et al. 

^ 1999^: 13. (lWHQ il98l : 14. ( 
18. (|Brown et »i.lll998^ : 19. (|WHOlll992t) : 20 



3 Results 



We performed a metaanalysis of identities of the vaccine strains and dominant circulating strains, 
vaccine effectiveness, and antigenic distances between vaccine strains and dominant circulating 
strains measured with the HI assay using ferret antisera. In one season dominated by HlNl, 
epidemiological statistics in a certain region reported in literature was used to fix the values of 
n„, Nu, Uy, Ny, and the mean and standard error of the vaccine effectiveness. HI assay data 
in literature are also used to determine antigenic distance c?i and c?2 between the vaccine strain 
and dominant circulating strain. Results of the metaanalysis are listed in Table [2] Sequence-based 
antigenic distances Pcpitopc, Paii-cpitopc, and Psoquoncc are calculated from the sequences of the vaccine 
strain and dominant circulating strain by equations O [9l and ITOl respectively. Values of pcpitopo, 
Paii-opitope, and ^sequence iu cach scasou dominated by HlNl are also listed in Table [2j 

While the number of data points is limited, a linear relationship exists betwe en vaccine effective - 
ness and Pcpitopc by using least squares. Similar to the case for H3N2 influenza taup ta et al. 20061 ). 
Pepitopc strongly correlates with HlNl vaccine effectiveness, with E? — 0.68. The fitted model 
predicts a vaccine effectiveness of 52.7% when Pcpitopc — 0; and vaccine effectiveness is greater than 
zero when Popitopc < 0.442. In Figure [21 the fitted trend line is within one standard error of all 
data points with Popitopo > 0, validating the ability of the Popitopc model to predict the vaccine 
effectiveness with only the sequences of the vaccine strain and the dominant circulating strain. 

Although statistical errors exist in the observed vaccine effectiveness, the collected vaccine ef- 
fectiveness data reject the null hypothesis that the vaccine effectiveness is independent of popitopo- 
The nine pairs of vaccine strains and dominant circulating strains in Table [2] have five difference 
antigenic distances between vaccine strain and dominant circulating strain defined by Popitopc- The 
nine pairs of strains were thus categorized into group 1-5 with Pcpitope equal to 0, 0.083, 0.121, 0.125, 
and 0.318, respectively, and the average vaccine effectiveness and standard error were calculated for 
each group. The vaccine effectiveness differences between these five groups were significant, such 
as group 1 and group 4 (p = 0.0079) and group 1 and group 5 (p = 0.0054). Moreover, statistical 
analysis shows that the introduction of popitopc is valuable in the selection process of vaccine strains. 
The slope of the fit line is significantly smaller than zero {p = 0.0027). Hence the linear model is 
able to predict the vaccine effectiveness with the knowledge of Pepitopo- In other words the non-zero 
slope of vaccine effectiveness as a function of Pcpitopc is significant to the level of 0.27%. 

Two other sequence-based antigenic distance measures alternative to Pcpitopc are Paii-cpitopc and 
PsoqucncG- Unlike Pcpitopc, which focuses upon the mutations in the antibody binding regions, 
Paii-cpitopc calculates the fraction of mutated amino acids in all the five epitopes, and Pscqucnce 
calculates the fraction of mutated amino acids in the whole HAl domain of hemagglutinin. The 
Psequoncc mcasurc is also one of the optional distance measures for phylogenetic softwares. In Figure 
m the correlation between HlNl vaccine effectiveness and Paii-cpitopc has — 0.70. In Figure Sj 
the correlation between HlNl vaccine effectiveness and Pscquence has = 0.66. The predicted 54% 
vaccine effectiveness when Paii-cpitopo in Figure [3] and when Psequcnce = in Figure [5] are almost the 
same as the 53% predicted by the Pcpitopc method. By contrast jOaii- epitope and Pseouenc e for H3N2 
have less impressive correlations with H3N2 vaccine effectiveness (|Gupta et al. 120061: ISun et al 



and Paii-epitope and Pscqucncc are not as effective as Pcpitopc as antigenic distance measures 
and vaccine effectiveness predictors for H3N2. 

The HI assay and derived distance measures di and c?2 are still the most widely used mea- 
sures by researchers and health authorities to identify newly collected circulatin g strains. These 



methods are used to recommend the vaccine strain for the coming flu season (jCox et al\ 12007 
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2003t WHO collaborating center for surveillance and control of influenzal 2008), to dra w the anti 



genie map ( Smith et al. 2004), and to support the phylogenetic data ( Cox et al. 20031 ). Figure [5] 
and [6] describe the correlation between vaccine effectiveness and antigenic distances di and d2 from 
the HI assay. A correlation is found in both figures. In the season 1995-96 in Israel, the vaccine 
strain is A/Singapore/6/86 (HlNl) and the dominant circulating strain is A/Texas/36/91 (HlNl), 
between which the averaged di is 0.86. Since the vaccine effectiveness is only 32.2%, its discrepancy 
to the corresponding effectiveness 42.5% in the trend line is much larger than one standard error 
of vaccine effectiveness. Similarly, the same pair of vaccine strain and dominant circulating strain 
introduces a data point further from the trend line if d2 is used as the distance measure. We also 
notice that two strains could be antigenically identical as measured with HI assay but antigenically 
distinct as measured with Popitopc- As shown in Tabled in the season 1982-1983, the HlNl vac- 
cine strain A/Brazil/ 11/78 and dominant circulating strain A/England/333/80 presented the anti- 
genic distance measured with HI assay di — and the sequence-based antigenic distance measure 
Pepitopc — 0.083. The H3N2 vaccine strain and dominant circulating strain sho wed identical di an d 



d2 values but distinct Popitopc values in the seasons 1996-1997 and 2004-2005 (iGupta et aliuOOw . 
Note that if Pepitope is incorporated into the linear models shown in Figure [S] and [SI the value is 
increased. We fit a linear model vaccine effectiveness = a + /3iPepitope + /32rfi + kd2 -I- e in which e 
is an error term. The fitted model is vaccine effectiveness = 0.54 — 2.179popitopG+ 0.068c?i + 0.003c?2 
with i?2 = 0.72. 



4 Discussion 



4.1 Verification of the ^epitope Model 

Originally the Pcpitopc model was implemented for the H3N2 virus, where Popitopc cor relates with 
H3N2 vaccine effective ness with a significantly larger than do Paii-opitopo and Pscquonce (jGupta et al. 
20061 : ISun et al. 2006h . In the case of HlNl, the advantage of Pcpitopc over Paii-opitopo and Pscquoncc 
is not as remarkable as for H3N2. We speculate that antibodies against the H3N2 virus may bind 
to a small fixed region on the surface of H3 hemagglutinin while antibodies against the HlNl 
virus may have multiple binding regions available. In other words, we speculate that the dominant 
epitope in H3 hemagglutinin may contribute substantially to the escape of the H3N2 virus from 
host antibodies, while escape mutations may occur in the dominant epitope as well as perhaps the 
subdominant epitopes of HI hemagglutinin. Our speculation comes from the fa ct that th e epitope 
region in HlNl contains more amino ac id positions than does that in H3N2 (Deem and Pan 2009) . 

Two recent epidem iological studies ([Centers for Disease Control and Prevention (00011200931 : 
Skowronski et allhOld ) present further support of the Pcpitopc model. Before the emergence of the 
HlNl pandemic flu in April 2009, the 2008-2009 flu season was dominated by subtype HlNl sea- 
sonal flu. Both the dominant circulating strain and the vaccine strain in the 2008-2009 se ason were 
A/Brisbane/57/2007 (HlNl) (jCenters for Disease Control and Prevention fC DC)"2009dV_Tlieob; 



served vaccine effectiveness against seasonal flu was 44% (95% CI: 33% to 59%) (Skowr onski et al 
l2010f) . The Pcpitopo model predicts the vaccine effectiveness as 53%, which falls into the 95% CI of 
the reported vaccine effectiveness. 

After April 2009, a new peak of influenza activity emerged. The dominant circulating strain in 
this pe riod was the pandemic HlNl strain A/California/7/2009 (jOenters for Disease Control and Prevention fCDC) 
2009bl l3). The reported effectiveness of the 2008-2009 se asonal flu vaccine against the HlNl 
pandemic flu was -50% to -150% (,Skowronski et d.ll2010l) and -10% (95% CI: -43% to 15%) 
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epitope 



Figure 2: Vaccine effectiveness for influenza-like illness correlates with Pepitope, = 0.68 (solid 
line). Data from Table [2j The trend line quantifies vaccine effectiveness as a decreasing linear 
function of Popitopo- Vaccine effecti veness = —1.19 p epitopc + 0.53. Also shown is the vaccine 



effectiveness to H3N2 (dashed line) (jGupta et aLll200' 
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0.04 0.06 

Pall-epitope 



0.08 



0.1 



Figure 3: Vaccine effectiveness for influenza-like illness correlates with Paii-cpitopc with = 0.70. 
Data from Table [5J The trend line quantifies vaccine effectiveness as a decreasing linear function 
of Paii-opitopo- Vaccine effectiveness = -4.16 Paii-cpitopc + 0.54. 
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Figure 4: Vaccine effectiveness for influenza-like illness correlates with Pscquoncc with — 0.66. 
Data from Table [51 The trend line quantifies vaccine effectiveness as a decreasing linear function 
of 

Pscqucncc- Vaccinc effectiveness — 7.37 Pscqucncc 

+ 0.54. 
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Figure 5: The correlation with ~ 0.53 between vaccine effectiveness for influenza-like illness and 
di , the antigenic distance defined by HI assay using ferret antisera. Data from Table [21 The di 
values were averaged if multiple HI assay experimental data were found. The trend line quantifies 
vaccine effectiveness as a decreasing linear function of di. Vaccine effectiveness = —0.085 di +0.50. 
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Figure 6: The correlation with = 0.46 between vaccine effectiveness for influenza-like illness and 
c?2, the antigenic distance defined by HI assay using ferret antisera. Data from Tabled The c?2 
values were averaged if multiple HI assay experimental data were found. The trend line quantifies 
vaccine effectiveness as a decreasing linear function of c?2- Vaccine effectiveness = —0.013 c?2 + 0.51. 



14 



( Centers for Disease Control and Prevention (CDC)ll2009a ) . The value of Popitopo between A/California/7/2009 



and A/Brisbane/57/2007 is 0.77 with epitope B as the dominant epitope. The vaccine effectiveness 
forecast by the Pcpitopo model is —39%, which agrees with the measured vaccine effectiveness values. 

4.2 Comparison of H3N2 and HlNl Vaccine Effectiveness and Evolution 
Rates 

The jPpnitnns modc l has been previously applied to the prediction of H3N2 vaccine effectiveness 
( CuDta fi7«i. l l20n6h . The H3N2 vaccine effectiven ess with PppitnpR = is 44.6%, and vaccine effec- 



tiveness is greater than zero for ^epitope < 0.184 ( Gupta et al. 20061 ). Thus, HlNl vaccines tend 



to have higher vaccine effectiveness compared to H3N2 vaccines, as shown in Figure ]^ The com 



paris on between H3N2 and HlNl vaccine effectiveness (Figure [2] versus Figure 2 of (jCupta et al 



illustrates that HlNl vaccine has higher effectiveness than the H3N2 vaccine as a function 



of Pepitope- This obscrvation suggests that the host immune system is more effective at recognizing 
and eliminating the HlNl virus (pepitope = 0), and that humoral cross immunity is stronger for 
HI hemagglutinin (pcpitopo > 0). This observation also explains why an H3N2 epidemic is usually 
a more severe health threat than an HlNl epidemic. We propose that HlNl has a longer history 
of circulating in the human population, so human immune system may recognize HlNl more ef- 
fectively, and this may be the reason that under stronger immune pressure, the HlNl virus may 
have a higher degree of adaptation to the human host. In the following discussion, we verify this 
hypothesis by two facts. First, the HlNl virus has a larger antigenic diversity than does the H3N2 
virus. Second, the HlNl virus presents higher evolutionary rate in the per dominant season basis. 

To compare the antigenic diversities of HlNl and H3N2, we downloaded from the NCBI database 
on 13 August 2009 all the amino ac id sequences of H3 hemagglutinin collected in the 18 years with 
H3N2 dominant circulating strains ( Gupta et al\\20odi} and those of HI hemagglutinin collected in 7 



years with HlNl dominant circulating strains (Tabled]). Thus 18 subsets of H3N2 sequences and 7 
subsets of HlNl sequences were formed. The centers of these subsets are the corresponding vaccine 
strains in the same season of the circulating virus. The radius of each subset is obtained by the 
calculation of Pcpitopc- First, the strains with the top 5% Pepitope antigenic distance measure to the 
center of each subset were selected, to focus on the extent of viral evolution. Second, the Pcpitope 
between these selected strains and the center were averaged in each year as the radius. Third, the 
radii were averaged over all the 18 years for H3N2 and over 7 years for HlNl. That is, the average 
radius of the top 5% was calculated in each year. As a result, the average H3N2 subset radius 
with the vaccine strains as the centers is 0.211. The average HlNl radius is 0.520 with the vaccine 
strains as the centers. This difference between the H3N2 radius and the HlNl radius is significant 
with the p- value 0.0118 using the Wilcoxon rank-sum test. Consequently, the HlNl virus has a 
larger antigenic diversity in each season compared to the H3N2 virus, as shown in Figure [T] 

We also compared the evolutionary rates of HlNl and H3N2 because evolutionary rate of the 
virus is an index of the selection pressure of the virus. The virus undergoes less immune pressure in 
a non-dominant season and high immune pressure in a dominant season. It has been noticed that 
in HI and H3 hemagglutin in, the region outside epitopes presents si gnificantly lower evolutionary 
rate than do the epitopes (jPeem and Pan 2009t Ferguson et al. 2003). This phenomenon indicates 



that without immune pressure, the spontaneous evolutionary rates of both HlNl and H3N2 are 
low. Therefore, a higher evolutionary rate of one virus subtype in a dominant season comes from 
the higher immune pressure rather than neutral evolution, and we reject the alternative scenario 
that the higher evolutionary rate causes a virus subtype to be dominant in one season. So the 
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evolutionary rate per dominant season is a natural measure of the virus evolution. Between 1983 
to 1997, H3N 2 was dominant in 8 of 15 years, and between 1977 to 2000, HlNl was dominant in 
5 of 24 years (jFerguson et aZjl2003l ). Between 1980 to 2000, the HAl domain of H3 hemagglutinin 
has a higher annual evolutionary rate of 3.7 x 10~^ nucleotide substitution/site/year than does the 
HAl domain of HI her uaKglutinin, which ha, s the annual evolutionary rate of 1.8 x lO"'^ nucleotide 
substitution/site/year ( Ferguson et al. 2003f ). Measured on a per dominant season basis, however, 
the HAl domain of HI hemagglutinin evolves faster in its dominant season with the rate of 8. 6x10"'^ 
nucleotide substitution/site/dominant season than does the H3 hemagglutinin with the rate of 
6.9 X 10"'^ nucleotide substitution/site/dominant season. The difference is significant with a p- 
value 0.0008. Similarly, between 2000 and 2007, the HAl domain of HI hemagglutinin evolves faster 
in its dominant season with the rate of 10.2 x 10"'^ nucleotide substitution/site/dominant season 
than does the H3 hemagglutinin with the rate of 7.4 x 10~^ n ucleotide substitutio n/site/dominant 
season. The difference is significant with a p- value 0.0005 (jZaraket et all [2"009fl . Here we have 
divided the annual evolutionary rate by the proportion of dominant years for both HI and H3 
hemagglutinin. Even on a short time scale without fixation, HI hemagglutinin shows a comparable 
or higher mutation rate of 9.1 x 10~^ nucleotide subs titution/site/day than H3 hemagglutinin of 
4.2 X 10~^ nucleotide substitution/site/day {p = 0.26) ( Nobusawa and Sato 2006f ). probably caused 
by the adaptation to the higher immune pressure, at least for some strains. To make this last point, 
we have assumed that the mutation rate of the HA gene is the same as that of the NS gene. We 
assume that the same polymerase is operating on these two genes, and so the mutation rates are 
expected to be the same. The comparisons of evolutionary rates and mutation rates between H3N2 
and HlNl are summarized in Figure [T] 



4.3 The ^epitope Model as a Supplement to HI Assay 

For both HlNl (this paper) and H3N2 (jGupta et al\\2004 ]. the HI assay correlates less well with 
vaccine effectiveness than does Popitope- Collection of HI assay data measuring antigenic distance is 
also more time-consuming and more expensive compared to the Popitopo model. Many hundreds of 
strains are circulating and collected in an average flu season, thus an HI table with tens of thousands 
of entries needs to be built to assess the antigenic distance between each pair of strains. With the 
high-throughput sequencing technology generating hemagglutinin sequence data, such antigenic 
distances are easily measured with the sequence-based antigenic distance measure Pcpitopc, which 
correlates to a greater degree with vaccine effectiveness than do the HI data. 

The Popitope model is developed to provide researcher and health authorities with a new tool to 
quantify antigenic distance and design the vaccine. We do not suggest that Pcpitopc should substitute 
for the current HI assay, but rather suggest that Pepitopo serves as an additional assessment when 
selecting vaccine strains. Using Popitopo to supplement to HI assay data may allow researchers and 
health authorities to more precisely quantify the antigenic distance between dominant circulating 
strains and candidate vaccine strains. The adoption of the Pcpitopc theory may also allow researchers 
to minimize the cost and the number of ferret experiments and to correct HI assay data in some 
situations. 
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Figure 7: The comparison between H3N2 (triangle up) and HlNl (triangle down) in regard to 
the antigenic diversity, the evolutionary rate between 1980 and 2000 (left), the evolutionary rate 
between 2000 to 2007 (right), and the mutation rate on a short time scale without fixation. The 
antigenic diversity is measured with Pepitope, the unit of evolutionary rate is 10~^ nucleotide sub- 
stitution/site/year, and the unit of mutation rate is 10~^ nucleotide substitution/site/day. 
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1 Humoral Immune System Plays a Major Role in Immunity 
to Influenza 

The influenza vaccine considered in this study is the trivalent inactivated vaccine (TIV) adminis- 
tered by intramuscular injection. The effective components of TIV are hemagglutinin (HA) and 
neuraminidase (NA) th at noticeably induce the humoral immunity but activate the cellular im- 
munity less vigorously ( Dohertv and Kelsol 20081 ). The other, cold-adaptive trivalent live atten- 
uated influenza vaccine ( LAIV) is also believed to induce the cellular immunity to a low level 
(jPohertv and Kelsol[2008l) . 

The humoral immunity greatly relies on the antigenic distance between the hemagglutinin of 
the vaccine and that of the dominant circulating strain. On the other hand, the cellular immune 
system focuses on the hi ghly conserved i nternal proteins, which are the Matrix protein 1 (Ml) and 
the nucleoprotein (NP) ( Lee et al. 2008[ ). In contrast to the antibodies , CD8-F and CD4-I- T cells 
show notable cross immunity to a wide variety of strains (|Lee et a/.ll2008r ) . Like the cellular immune 
system, the antigen- unspecific inn ate immune system generates a homogeneous immune reaction 
against different influenza strains (jJanewav et a^.l 120051 ) . 

For all these reasons ferret antisera, in which antibodies are the major immune component, is 
used in the hemagglutination inhibition (HI) assay as the conventional way to measure the antigenic 
distance between the vaccine strain and the dominant circulating strain. Therefore, we consider 
the antibody rather than the cellular or innate immune system to be the dominant element in our 
quantiflcation of antigenic distance between two influenza strains and the key factor for influenza 
vaccine effectiveness. 



2 Evaluation of Vaccine Eff^ectiveness 

By deflnition, vaccine efflcacy is measured by controlled trials with initially susceptible subjects, 
while vaccine effectiven ess is measured b y epidemiological observance of susceptible population 
without giving placebo ( Kellv et al. 2009() . Vaccine efflcacy is relatively more idealized than vac- 
cine effectiveness, because vaccine effectiveness depends on vaccine efflcacy and other environmental 
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factors (iTorvaldsen and Mclntvrdl2002r ). Although the terms vaccine effi cacy and vaccine effective- 
ness are interchangeable to some extent ([Torvaldsen and Mclntvrdliool . we use the term vaccine 
effectiveness because factors other than vaccine strain and dominant circulating strain are involved 
in the studies used in our metaanalysis. The data source for vaccine effectiveness calculation used 
ILI as the primary endpoint, and studies we use contained controlled unvaccinated groups. 

The data from fo ur studies require additional clarification. The paper by Edwards et al. 
( Edwards et al\\l994\ did not provide the retrospectively reported influenza-like illness data prior 
to the 1987-88 season, so we use the number of ill subjects presenting for throat culture to calculate 
the morbidity rate u and v. Note that in this study, subjects with influenza-like disease were re- 
quired to show up for a throat culture, and when characterized by vaccination status the numbers of 
such patients is thus a reasonable estimation of the illness rate u and v. Moreover, in other seasons 
when both retrospective data and number of presenting ill subjects were available, the vaccine effec- 
tiveness calculated from retrospectiv e data and numbers of presenting ill subjects are s imilar to each 
other, especially for subtype HlNl ( Edwards et al. 19941 ). In Grotto et al.'s study ( Grotto et al 



1998[ l using influenza strains from Israel in the 1995-96 season, the numbers of sampled HlNl and 
H3N2 strains in Israel was 7 and 35, respectively. The samples were collected from six clinics in 
December that was in the middle of the influenza season. However, the number of both subjects and 
viruses sampled are limited. At the global level with more available data, it was observed that HlNl 
and H3N2 were co-circulating with comparab le frequencies, and HlNl virus was found in North 



America and part of Eurasia including Israel (jCenters for Disease Control and Prevention fCDC) 



1991 . The proportion of HlNl in samples during the 1995-96 season in USA ranks #5 in HlNl 



proportion since 1977 ( Ferguson et al. 2003f) . In the same season, H3N2 vaccine strain and dom- 
inant circulating strain were a perfect match, so the decrease in the overall vaccine effectiveness 
is expected to be due to the mismatch in the HlNl component. Therefore we treat HlNl here 
as a co-circulating strain and tak e into account the vaccine effectiveness reported in this article 



( Grotto et al. 19981 ). Keitel et al. ( Keitel et al. 1997) reported that the dominant circulating strain 



in the 1983-84 season was A/Chile/1/83 rather than A/ Victoria/ 7/83 in this table and in other 
cited studies. The illness rate u and v are small, and so the standard error of vaccine effectiveness 
is 64.8%, which is unacceptable. Thus the use of Keitel et al.'s data for 1983-84 season is not 
appro priate to the vaccine effectiveness assessment. The reference by Couch et al. (|Couch et al 
1996h did not provide original data , Nu , Uy , and for the calculation of vaccine effectiveness 



Error bars of vaccine effectiveness in these seasons were calculated with other data sources. 



3 Robustness of the Pepitope 

Model 

Influenza vaccine effectiveness may depend not only on the antigenic distance between the vaccine 
strain and the dominant circulating strain quantified by Popitopo, but also on the percentage of 
people vaccinated, the time of vaccination in the infiuenza season, infiuenza virus transmissibility 
and reproduction rate, and individual's immune history. Thus, development of the public health 
system and a greater fraction of the population being vaccinated may result in a trend of both 
HlNl and H3N2 vaccine effectiveness. The statistics of vaccine effectiveness could be biased by 
these factors. Nevertheless, greater than 50% of the HlNl and H3N2 vaccine effectiveness are 
explained by Pepitope, since > 1/2. To show that the model of vaccine effectiveness can be 
well reduced to a linear form between Pcpitopc and vaccine effectiveness, we calculated the residuals 
of linear regression of vaccine effectiveness on Popitope, and performed another linear regression of 
these residuals versus year. The trend line of the residuals has a slope of — 0.0002/year and the null 
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Figure SI: The linear regression with ~ 0.0003 of the residuals of HlNl vaccine effectiveness 
versus year. Data from Table 1 and Figure 2 in the main text. The slope of the trend line is 
-0.0002/year. ANOVA test: Hq: slope = 0, F = 0.0021, and p = 0.96. The null hypothesis that 
these residuals are independent of time cannot be rejected. 



hypothesis that the slope equal s zero is not rejecte d {p = 0.96), as shown in Figure[HIl The residuals 
of H3N2 vaccine effectiveness toup ta et al. I l2006l) were also correlated with the year and the slope 
— 0.0013/year is not significantly different with zero {p — 0.58), as shown in Figure [S2] Therefore 
the contribution of other simple time-dependent factors other than Pcpitope to HlNl and H3N2 
vaccine effectiveness in humans is negligible. Our analysis suggests that the vaccine effectiveness 
data in this paper are negligibly affected by these potential biases. 

Despite the limited number of available data points in this study, the correlation line between 
the Popitopo and the vaccine effectiveness has statistical meaning. In Figure 2 in the main text, the 
trend line is vaccine effectiveness — —1.19 Popitopo + 0.53, the greatest determinant of which is the 
data point 1986-87 (b). If this data point is removed, the trend line becomes vaccine effectiveness — 
— 1.63 Popitopo + 0.54, which is not fundamentally distinct with the original trend line. In the data 
point 1986-87 (b), the difference of the vaccine effectiveness predicted by these two models is 0.13, 
which is roughly one standard error. In reality, most Pepitope values are less than 0.1, and so most of 
the differences between these two predicted vaccine effectiveness values are less than 0.034, which 
is within the noise levels of the epidemiological measurements. 
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Figure S2: The linear r egression with = 0.018 of the residuals of H3N2 vaccine effectiveness 
versus year. Data from ( Gupta et al. 20061 ). The slope of the trend line is — 0.0013/year. ANOVA 
test: Hq: slope = 0, F = 0.32, and p = 0.58. The null hypothesis that these residuals are 
independent of time cannot be rejected. 
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The basis for calculating P eaitope is a set of w ell defined epitopes. An early definition of the five 
epitopes in Hf hemagglutinin ( Caton et aLlll983 ) did not identify numerous amino acid positions in 
which mutations were frequently selected in history. These positions are presumably under strong 
antigenic pressure to be selected for escape mutation. The more recent definition of HI epitopes 
incorporates th ese additional amino acid positions as well as amino acids from the epitopes of H3 
hemagglutinin (|Deem and Pan I2OO9I) . Likely additional experiments on the HI epitopes will allow 
further refinement of the calculation of Pepitope- Only nine epidemiological data points are available 
since the reemergence of HlNl virus in humans in 1977. The Popitopo model parameters may be 
further improved as epidemiological data are accumulated. 

The antigenic properties are determined by a small number of amino acid substitutions, because 
the positions and the amino acids introduced by mutation have distinct effects on the change of 
antigenic distance between vaccine strain and dominant circulating strain. For example, mutations 
yielding charged amino acids in the dominant epitope are favorable for the virus, and may be the 
key amino acid substitution for the antigenic properties (K. Pan et al., submitted). An improved 
sequence-based model might assign different amino acid substitutions with weights determined by 
the decrease of binding constant between HA and antibody using free energy calculation (K. Pan 
and M. W. Deem, submitted). With the current knowledge, the less precise but safe Pepitope model 
assigns the amino acid substitutions in the dominant epitope with weight one, and assigns other 
amino acid substitutions with weight zero. The Pepitope model can nevertheless correlate with the 
vaccine effectiveness better than the antisera data. That is, for both HlNl and H3N2, the Pcpitopo 
method is superior to other methods in current use. 



4 Comparison of the ^epitope Model and the HI Assay for 
H3N2 Virus 

In some cases Pepitope model detects antigenic variants better than the HI assay. In the 2003-04 
Northern hemis phere flu seas on, the majority of isolated H3N2 strains were similar to A/Fujian/41 1/2002 
using HI assay (|WHO||2004[) . hence WHO recommended a A/Fujian/411/2002-like strain as the 
2004-05 Northern hemisphere H3N2 vaccine component, and A/Wyoming/3/2003 was selected. 
Ahhough A/WyominK/3/2003 is similar to A/Fui ian/41 1/2002 circulating in 2004-05 ("antigeni- 



cally equivalent" b y HI data ijHarper et aLll2004f l). the vaccine effectiveness was only moderate 



( Gupta alll2006[ ). Interestingly, the Popitope between A/Fujian/411/2002 and A/Wyoming/3/2003 
was also moderate (pepitope — 0.095), predicting the moderate vaccine effect iveness. In fact, th e 
Pcpitopo method can also detect antigenic variants more rapidly as they emerge ( He and Deemll2010[ ). 
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